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Abstract:  VLSI  technologists  are  fast  developing  wafer-scale  integration.  Rather  than  par¬ 
titioning  a  silicon  wafer  into  chips  as  is  usually  (tone,  the  idea  behind  wafer-scale  integration  is 
to  assemble  an  entire  system  (or  network  of  chips)  on  a  single  wafer,  thus  avoiding  the  costs  and 
performance  toss  associated  with  individual  packaging  of  chips.  A  major  problem  with  assembling 
'  a  large  system  of  microprocessors  on  a  single  wafer,  however,  is  that  some  of  the  processors,  or 
cell •,  on  the  wafer  are  likely  to  be  defective  In  the  paper,  we  describe  practical  procedures  for 
integrating  wafer-scale  systems  “around”  such  faults.  The  procedures  are  designed  to  minimise 
the  length  of  the  longest  wire  in  the  system,  thus  minimising  the  communication  time  between 
cells.  Although  the  underlying  network  problems  are  NP-eomplete,  we  prove  that  the  procedures 
are  reliable  by  assuming  a  probabilistic  model  of  cell  failure.  We  also  discuss  applications  of 
this  work  to  problems  in  VLSI  layout  theory,  graph  theory,  fault-tolerant  systems  and  planar 
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VLSI  technologists  ere  fast  developing  wafer- scale  integration  [30].  Rather  than  partitioning 
•  silicon  wafer  into  chips  as  is  usually  done,  the  idea  behind  wafer* scale  integration  is  to  assemble 
an  entire  system  (or  network  of  chips)  on  a  single  wafer,  thus  avoiding  the  costs  and  performance 
loss  associated  with  individual  packaging  of  chips.  A  major  problem  with  assembling  a  large 
system  of  microprocessors  on  a  single  wafer,  however,  is  that  some  of  the  processors,  or  cells, 
on  the  wafer  are  likely  to  be  defective.  Thus  a  practical  procedure  for  integrating  wafer-scale 
systems  must  have  the  ability  to  configure  networks  "around”  such  faults. 

This  paper  considers  a  variety  of  problems  involving  the  construction  of  systolic  arrays  [15]. 
Systolic  arrays  are  a  desirable  architecture  for  VLSI  because  all  communication  is  between  nearest 
neighbors.  In  a  wafer-scale  system,  however,  all  the  nearest  neighbors  of  a  processor  may  be  dead, 
and  thus  the  prime  advantage  of  adopting  a  systolic  array  architecture  may  be  lost  if  a  long  wire 
connects  adjacent  processors.  In  general,  the  longest  interconnection  between  processors  will  be 
a  communication  bottleneck  in  the  system.  Of  the  many  possible  ways  in  which  the  live  cells  on 
a  wafer  can  be  connected  to  form  a  systolic  array,  therefore,  the  one  that  minimises  the  length 
of  the  longest  wire  is  most  desirable  from  a  computational  standpoint  because  communication 
overhead  is  least 


To  illustrate  the  subtleties  inherent  in  configuring  systolic  arrays,  consider  the  problem  of 
constructing  a  linear  (i.e.,  one-dimensional)  array  using  all  of  the  live  cells  in  an  JV-cdl  wafer. 
Unfortunately,  if  we  wish  to  minimise  the  length  of  the  longest  wire,  the  problem  is  NP-complete 
[11].  Even  more  discouraging  is  that  there  are  some  arrangements  of  live  and  dead  cells  for  which 
even  the  optimal  linear  array  has  unacceptably  long  wires.  Thus  optimal  solutions— even  if  they 
could  be  found  quickly— are  not  always  practical. 

By  asauming  a  probabilistic  model  of  cell  failure,  however,  many  positive  results  can  be  proved. 
For  example,  Figure  1  illustrates  a  possible  solution  to  the  problem  of  connecting  the  live  cells 
of  a  wafer  into  a  linear  systolic  array.  The  live  cells,  which  are  denoted  by  small  squares,  are 
connected  together,  one  after  another,  in  a  snake- like  pattern.  Dead  cells,  denoted  by  X’s,  are 
skipped  over.  With  probability  1  —  0(1 /N),  the  length  of  the  longest  wire  is  OQgN),  where  N 
is  the  number  of  cells  in  the  wafer  and  where  each  cell  independently  has  a  fifty  percent  chance 
of  failure. 
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figure  l.  A  simple  means  of  constructing  a  linear  systolic  array  from  the  live  cells  on  a 
wafer. 
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This  bound  comes  from  the  observation  that  the  length  of  the  longest  wire  that  connects  two 
cells  in  the  array  is  just  the  length  of  the  longest  sequence  of  dead  cells  in  the  snake-like  string. 
For  a  given  set  ef  k  cells,  the  probability  that  all  are  dead  is  1/2*,  and  thus  the  probability  that 
any  set  of  21g  Af  cells  are  dead  is  1  /Af*.  The  chances  are,  therefore,  less  than  one  in  Af  of  having 
to  eUp  more  than  21gAf  cells  in  the  entire  snake-like  path  of  length  N,  and  thus  with  probability 
1  —  0(1 /N)  the  maximum  wire  length  is  0(lg  Af). 

1b  say  that  with  probability  1 —0(l/N)  the  maximum  wire  length  is  0(lg  Af )  is  a  substantially 
stronger  statement  than  saying  that  the  expected  maximum  wire  length  is  0(lg  N).  Not  only  is 
the  expected  maximum  wire  length  0(lg  N),  but  the  chances  of  it  being  much  larger  are  miniscule. 
Furthermore,  the  probability  can  be  made  much  higher.  For  example,  the  probability  of  having 
to  skip  more  than  SlgAf  dead  cells  in  the  entire  snake-like  path  is  less  than  one  in  Af*.  A  small 
adjustment  to  the  constant  within  the  Kg  Oh  results  in  a  much  higher  probability. 

Not  surprisingly,  there  are  algorithms  which,  under  similar  assumptions  of  cell  failure,  produce 
far  better  results  than  the  algorithm  illustrated  in  Figure  1.  For  example,  we  will  describe  in 
Section  3  another  simple  procedure  which,  with  high  probability,  constructs  a  linear  array  using 
wires  of  length  0(>/lgN ).  We  will  also  show  that,  up  to  the  leading  constant,  the  algorithm  is 
the  best  possible  of  its  kind.  By  relaxing  the  constraint  that  all  live  cells  be  connected  into  the 
Sneer  array,  however,  we  can  do  much  better.  In  fact,  we  will  also  show  in  Section  3  that  with 
high  probability,  a  linear  array  containing  any  constant  fraction  (less  than  one)  of  the  five  ceils 
on  an  Af-cdl  wafer  can  be  constructed  using  wires  of  at  most  constant  length. 

Although  there  are  numerous  uses  for  linear  systolic  arrays  [22],  two-dimensional  systolic 
arrays  me  also  important  Not  only  can  the  two-dteensional  array  be  used  as  a  powerful 
communications  structase  far  peraBd  cssepetntien  [15],  fart  it  can  die  serve  as  an  nil  pnrpoee 
structure  in  which  arbitrary  networks  can  be  embedded  (2,  IS,  21,  41,  43].  As  one  might  sxpect, 
the  problem  of  constructing  a  two-dimensional  array  fawn  the  live  celts  of  e  wafer  is  mere  dUBcult 
Bum  the  corresponding  problem  for  linear  arrays.  Specifically,  Section  4  contains  a  proof  that 
with  high  probability  a  two-dimensional  array  that  mm  any  eonatant  fraction  of  the  live  cells 
must  have  wires  of  length  il(y/lgN). 

Although  we  do  not  knew  how  to  construct  turn  ibmeesinnsl  arrays  from  moat  of  the  live  cells 
using  wires  of  length  Of  V%Af )  or  channels  ef  constant  width,  we  ran  come  dose.  We  show  in 
Section  6  that  with  high  probability,  a  two-dimensional  array  can  be  constructed  on  an  Af-cell 
wafer  using: 

1)  all  the  live  cells  with  wires  of  length  0(lg  Af  lglgAf)  end  channels  of  width  0(lg  Ig  N), 

2)  any  constant  fraction  less  then  one  of  the  live  ceQs  with  wires  of  length  0(y/\g  Af  lglgAf) 
and  channels  of  width  Oflglg  Af),  and 

3)  at  least  nfl/lglg*  Af)  of  the  U ve  ceds  with  wins  of  leagth  0(tfgN)  and  channels  of 
width  1. 

The  remainder  of  the  paper  is  divided  into  seven  sections.  Section  2  more  formally  describes 
our  model  fra  wafer-sale  integration  and  diecuaam  the  practicality  of  the  modeling  aaoumpllom. 
The  algorithms  for  constructing  linearly  connected  systolic  arrays  are  presented  in  Section  3. 
Section  4  contains  the  lower  bound  result  for  wire  leagth  in -two-dimensional  systolic  arrays.  In 
lection  S  we  praaant  a  worst-ease  (nonprobdbffiatk)  upper  bound  on  the  Cfcannd  width  necessary 


to  configure  a  two-dimensional  array.  Thia  result  has  application  to  the  fault-tolerant  encoding  of 
two  dimensional  arrays  in  complete  binary  trees  (31].  Section  6  gives  algorithms  for  constructing 
two-dimensional  arrays  in  the  probabilistic  model.  In  Section  7,  we  mention  some  related  problems 
in  geometric  complexity  and  graph  theory.  The  related  problems  are  nice  theoretically  in  that 
mom  of  them  have  tight  upper  and  lower  bounds.  They  also  suggest  a  wealth  of  interesting 
qnestions  concerning  the  design  of  fault-tolerant  systems.  We  conclude  the  paper  with  some 
additional  remarks  in  Section  8. 

« 

1  The  wafer-scale  model 

Laser-programming  the  interconnect  of  a  wafer  is  a  promising  means  of  achieving  wafer-scale 
integration.  This  technology  was  pioneered  at  IBM  (24]  and  pursued  in  the  direction  of  wafer- 
male  integration  at  MIT  Lincoln  Laboratory  (30].  Figure  2  shows  a  scanning  electron  microscope 
photograph  of  a  portion  of  a  wafer  with  programmable  interconnect.  Laser  welds  can  be  made 
between  two  layers  of  metal,  and  by  using  the  beam  at  somewhat  higher  power,  wires  can  be  cut. 
Defective  components  can  thus  be  avoided  by  programming  the  interconnect  to  connect  only  the 
good  components. 


Figure  2.  A  close-up  of  laser- programmable  interconnect 


Figure  3  shews  a  typical  organisation  of  a  wafer-scale  system  with  programmable  intercon¬ 
nections.  The  components  are  organised  as  a  matrix  of  cells,  and  between  the  cells  are  channels 
through  which  the  interconnect  runs.  Figure  4  is  a  close-up  of  the  channel  structure.  At  the 
intersection  of  a  horisontal  and  vertical  channel,  laser- programmable  connections  can  make  a 
borisontal  and  a  vertical  wire  electrically  equivalent.  Between  two  cells,  connections  can  be  made 
from  the  wires  in  the  channel  to  the  inputs  and  outputs  of  the  two  cells.  Given  that  the  inter¬ 
connect  it  programmable,  we  shall  adopt  a  usage  of  the  term  "wire”  to  mean  an  electrically 
equivalent  portion  of  the  programmable  interconnect. 
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Figure  S.  A  wafer- seal  e  system  of  cells  and  programmable  interconnect. 
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ilgnre  4.  The  channel  structure  of  a  wafer-scale  system. 


The  presmignment  of  wire  segments  to  layers  such  that  wires  in  one  layer  run  hofisoutsily 
and  the  other  vertically  is  called  Manhattan  wiring  [16].  This  wiring  model  has  been  studied 
extensively,  but  in  this  paper  the  details  of  the  wiring  are  notthe  central  issue.  H  will  be  sufficient 
to  understuadweTaet  about  Manhattan  wiring.  The  width  of  a  channel  need  only  be  a  constant 
factorlarger  than  the  maximum  number  of  wires  that  occupy  any  portion  of  the  channel. 
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A  natural  question  to  ask  about  the  use  of  programmable  interconnections  to  avoid  defective 
cells  is,  If  cells  ore  unreliable,  why  might  not  the  interconnect  fail  also?’  The  answer  is  that, 
Indeed,  interconnect  does  fail.  But  the  reliability  of  the  interconnect  is  much  higher  than  the 
reliability  of  the  cells.  The  interconnect  in  the  MIT  Lincoln  Laboratories  project,  for  example, 
takes  three  masking  steps  to  fabricate,  but  manufacturing  the  active  devices  requires  well  over  a 
dosen  steps.  This  project  is  targeting  yields  of  fifty  percent  for  cells  and  over  ninety-five  percent 
for  wires.  And  even  if  a  wire  fails  at  one  point,  it  is  often  possible  to  break  it  into  two  usable 
pieces. 

In  this  paper  we  shall  assume  that  the  interconnect  has  sufficient  redundancy  so  that  the 
inability  to  interconnect  cells  arbitrarily  is  a  rare  phenomenon.  In  this  sense,  we  are  making  the 
same  assumption  that  is  used  to  substantiate  redundancy  in  any  fault-tolerant  system.  The  idea 
is  not  that  the  system  will  be  completely  reliable,  but  that  its  failure  will  depend  on  the  failure 
of  the  most  reliable  component  instead  of  the  least  reliable  component. 

Another  assumption  that  must  be  examined  more  closely  is  that  the  probability  of  cell  failure 
is  independent  and  the  same  for  all  cells.  Failures  can  be  attributed  to  one  of  two  causes — 
materials  defects  during  manufacturing,  and  mask  misalignment.  Materials  defects  are  spread 
uniformly,  but  the  sise  of  the  region  affected  by  a  defect  is  a  separate  random  variable.  This 
means  that  if  one  point  on  the  wafer  is  flawed,  neighboring  points  are  also  likely  to  be  flawed. 
Nevertheless,  independence  of  cell  failures  is  quite  a  reasonable  assumption  because  the  area  of 
a  cell  is  substantially  larger  than  the  expected  area  of  a  defect. 

Mask  misalignment  is  a  somewhat  more  serious  problem  with  respect  to  our  modeling  assump¬ 
tions.  The  reason  is  that  misalignment  is  a  global  failure  mode.  Misalignment  due  to  translation 
of  the  axes  of  one  mask  relative  to  the  others  poses  no  real  problem  in  terms  of  the  modeling  as¬ 
sumptions,  however,  because  the  effect  is  the  same  for  all  cells.  The  real  problem  is  misalignment 
due  to  angular  rotation  of  one  mask  with  respect  to  the  others.  Those  cells  near  the  center  of 
rotation  are  much  more  likely  to  be  good  than  those  far  from  the  center.  Experimental  evidence 
indicates,  however,  that  the  effects  from  angular  rotation  that  cannot  be  accounted  for  by  our 
model  are  minimal. 

The  two  cost  functions  we  dial]  examine  in  this  paper  are  channel  width  and  maximum  wire 
length.  Minimising  channel  width  is  important  because  the  available  wafer  area  is  essentially 
fixed.  If  the  channel  width  is  large,  the  sise  of  the  system,  and  hence  its  functionality,  is  reduced. 
In  addition,  large  channel  widths  often  lead  to  long  wires,  and  minimising  the  length  of  the 
longest  wire  is  our  other  cost  criteria. 

Minimising  the  length  of  the  longest  wire  in  a  wafer-scale  system  is  important  because 
communication  delays  can  be  the  limiting  factor  of  the  performance  of  the  system.  Since  both 
resistance  and  capacitance  increase  with  the  length  of  wire,  the  time  required  to  drive  a  wire  can 
grow  as  fast  as  the  square  of  the  length  of  the  wire  (27).  (See  [4]  for  a  discussion  of  propagation 
delays  through  wires.)  In  particular,  a  designer  that  chooses  a  two-dimensional  systolic  array 
architecture  is  counting  on  low  overhead  for  communication,  and  will  not  want  communication 
down  a  long  wire  to  degrade  the  performance  of  the  system.  Furthermore,  for  reasons  of  electrical 
correctness,  cells  must  be  designed  with  signal  buffers  capable  of  driving  the  maximum  length 
wire.  Since  the  use  of  buffers  varies  with  the  sise  of  the  load  being  driven,  substantial  area  in  a 
cell  can  be  saved  if  the  maximum  length  wire  is  known  to  be  short.  As  was  argued  previously, 
this  savings  in  area  translates  to  larger  systems  with  greater  functionality. 
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Throughout  the  paper,  ere  will  consider  celli  which  occupy  an  s-by-s  square  region  on  the 
water  and  which  have  (independently)  a  probability  p  of  failure.  Unleu  specifically  stated  to  the 
contrary,  we  wifi  assume  for  simplicity  that  a  =  1  and  p  =  1/2.  As  we  will  later  observe,  these 
restrictions  have  little  bearing  on  the  analysis.  In  addition,  we  will  use  the  term  ‘high  probability” 
to  mean  %rith  probability  at  kaat  1  —  Q(\JN),a  where  N  is  the  number  of  cells  on  tbs  wafer. 

We  conclude  this  section  with  a  simple  result  that  places  the  rest  of  this  paper  in  a  proper 
context.  Given  a  circuit  composed  of  active  components  and  wires,  it  is  possible  to  construct 
0  .wafer  of  net  much  more  area  (asymptotically)  whieh  is  fault  tolerant.  If  there  are  N  active 
components,  expand  the  layout  of  the  circuit  in  each  dimension  by  ,  where  c  is  a  constant 

chosen  large  enough  that  21gAf  copies  of  a  given  active  component  fit  in  the  space  designated  to 
that  component  in  the  original  circuit.  The  probability  that  every  one  of  the  2  ig  JV  copies  is  bad 
is  I/N3,  and  thus  with  high  probability,  one  of  the  copies  of  every  component  is  good.  It  only 
remains  to  hook  them  up  in  the  space  left  for  whoa. 

This  scheme  works  even  if  components  are  different.  The  results  in  this  paper  are  better 
for  systolic  arrays,  however,  because  we  can  utilise  substantially  more  of  the  live  cells  at  leu 
cost.  Since  the  number  of  cells  on  a  wafer  might  typically  he  between  100  and  1000,  IgN  is  a 
considerable  fraction  of  N.  Some  of  our  algorithms  use  sK  ef  the  five  cells,  and  others  use  a 
considerable  proportion. 


i  Wafer-scale  integration  ef  fiaeaiiy  connected  systolic  amps 


The  snake-like  scheme  described  in  the  introduction  coanects  with  high  probability  all  the 
five  cells  ea  an  N-eaU  wafer  into  a  linear  stray  with  wine  of  length  at  most  OQgtf).  This  section 
substantially  improves  and  generalises  this  result.  We  commence  by  showing  that  this  bound  can 
be  improved  to  0(y/lgN),  which  is  optimal  to  within  a  constant  factor. 

Theorem  1.  With  probability  1  —  0{l/N),  the  Hoe  cell*  m  an  N-cell  wafer  can  be  connected 
m  a  linear  array  s ring  wires  of  length  0(y/lg  N ).  Up  to  the  leading  constant,  this  bound  it  the  best 
possible. 


Proof.  We  first  show  hew  to  construct  a  linear  army  uung  wires  of  length  0(>/Ig  N ).  Partition 
the  wafer  into  square  regions  containing  2  Ig  N  cells  each  as  is  shown  by  the  dashed  lines  in  Figure 
5.  The  probability  that  each  of  the  2  lg  N  cells  are  dead  in  one  or  more  of  the  squares  is  at  most 
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which  is  leu  than  l/N.  Thu  with  probability  1  —  QflfN),  each  of  the  squares  contains  at  least 
owe  live  cell. 

Construct  a  linear  array  out  of  the  live  cells  in  aneh  square  using  the  “transpose"  of  the 
algorithm  from  Section  1,  except  that  when  m  empty  column  is  encountered,  the  column  is 
shipped.  In  Figure  5,  these  connections  are  shewn  with  eefid  lines.  Since  any  pair  of  cells  in  the 
aame  square  can  be  linked  with  a  wire  of  length  at  meet  2v3Tg77,  the  wires  in  each  array  have 
length  OMgJV).  Next,  add  wires,  shown  by  dotted  lines  in  the  figure,  which  connect  the  email 
arrays  into  one  large  array.  Because  each  region  content  at  least  one  live  cell,  these  connections 
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can  be  made  with  wires  of  length  at  moat  3^2  lg  N.  Thua  every  wire  in  the  completed  linear 
array  has  length  0(\/lgAf )  with  high  probability. 

That  the  bound  cannot  be  unproved  by  more  than  a  constant  factor  is  due  to  the  observation 
that  with  high  probability,  some  live  cell  will  be  at  the  center  of  a  region  of  fl(lg  N)  dead  cells. 
Thus  a  wire  of  length  fl(>/lg  N )  will  be  required  to  link  the  isolated  live  cell  to  any  other  live 
cell.  To  demonstrate  this  bound  more  formally,  we  again  partition  the  wafer  into  square  regions, 
but  this  time  the  squares  are  rotated  by  forty-five  degrees  in  the  plane  to  form  diamond-shaped 
regions  containing  lg  N  —  2  lg  lg  N  cells  each,  as  is  shown  in  Figure  6. 


r  $oUW 
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Figure  6.  An  example  of  an  isolated  cell. 
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Suppoee  *  linear  array  can  be  constructed  using  wires  of  length  at  most  ^JlgAf  —  1.  Then 
to  any  given  diamond,  the  center  cell  is  not  the  only  live  cell  in  the  diamond.  The  probability 
that  every  diamond  avoids  this  condition  is  at  most 


(j  _  j-iiiv+aiiieNyr*^HWw  «  ^  _  1jfoy,,¥~^l*l",v 

<  g(~1^a)(i«w=yriT«w) 
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Thru  the  probability  that  the  optimal  linear  array  has  a  wire  of  length  fl(>/lgN)  is  at  least 

If  all  the  cells  are  incorporated  in  a  linear  array,  then  the  maximum  wire  length  is  9{y/igN) 
with  high  probability.  But  the  proof  of  the  lower  bound  suggests  that  isolated  cells  induce  the 
bag  wires.  Instead  of  tosistiag  that  all  live  cells  he  incorporated  in  the  linear  array,  suppose  we 
only  require  that  most  of  the  live  cells  be  included.  A  linear  array  that  incorpeates  mast  of  the 
live  cells  can  be  constructed  with  constant-length  wires.  The  proof  is  indirect,  and  depends  on 
the  following  lemma  (The  lemma  ie  essentially  equivalent  to  the  result  of  Sekanina  [36]  which 
states  that  the  cube  of  a  nontrivial  connected  graph  always  haa  a  Hamiltonian  circuit.  This  result 
was  later  reproved  by  Karegpuus  (12)  and  Rosenberg  and  Snyder  [35].) 

Lemma  2.  A,  spanning  free  T  with  maximum  win  length  L  cm  be  transformed  into  a  linear 
array  with  maximum  win  length  6 L. 

Proof.  We  show  that,  without  regard  for  wire  widths,  the  linear  array  can  be  constructed 
eaing  wires  of  length  3 L  by  tracing  over  wire*  is  T  no  man  then  twice.  The  larger  ft L  bound 
comes  because  the  channel  widths  need  to  be  doubled  to  accomodate  the  extra  wires. 

Choose  anode  v  to  be  the  root  of  T,  and  let  TtfTa,...,Tm  be  the  subtrees  of  vasia  shown 
in  Figure  7.  (Degenerate  sates  not  like  Figure  7  are  realty  handled,  but  wo  do  not  include  the 
details  here.) 

T 


v 


Figure  -  Canatrmtiag  a  linear  array  from  a  spanning  tree. 
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Assume  as  an  inductive  hypothesis  that  we  have  constructed  linear  arrays  on  the  nodes  of 
T» ,  Tj, . . . ,  rm  such  that  no  wire  has  length  greater  than  3 L,  and  so  that  the  end  points  of  the 
array  in  T%  are  v,  and  u*i  for  1  <  t  <  m.  Join  the  arrays  in  the  subtrees  by  adding  the  following 
wires:  (v,Un),  (vj.ujiJ.fvj.uji),  (These  wires  are  shown  as  dashed  lines  in 

Figure  7.)  Each  of  these  wires  has  length  at  most  3 L,  and  the  resulting  network  is  a  linear  array 
on  the  nodes  of  T  with  endpoints  v  and  vm,  which  completes  the  induction.  For  completeness, 
we  remark  that  the  basis  of  the  induction  is  easily  verified.l 


The  problem  of  constructing  a  linear  array  with  constant  maximum  wire  length  that  contains 
most  of  the  live  cells  has  now  been  reduced  to  the  problem  of  constructing  a  spanning  tree  with 
constant  maximum  wire  length  that  contains  most  of  the  live  cells.  The  next  lemma  shows  that 
such  a  spanning  tree  can  be  formed  with  high  probability. 

Lemma  3.  There  exists  a  positive  constant  e  such  that  for  any  d  (which  might  be  a  function  of 
N ),  with  probability  1  —  0(1/ N),  at  least  1  —  0(2~cd3)  of  the  live  cells  on  an  N-cell  wafer  can 
be  connected  in  a  spanning  tree  using  wires  of  length  at  most  d.  Up  to  constants ,  this  is  the  best 
possible  bound. 

Proof.  We  first  show  that  up  to  constants,  the  bound  is  the  best  that  one  could  hope  for.  In 
fact,  we  show  something  stronger — that  for  any  constant  &  >  2  with  probability  1  —  0(1/ N), 
no  more  than  1  —  0(2~M*)  of  the  live  cells  on  an  N-cell  wafer  can  be  connected  in  any  network 
using  wires  of  length  at  most  d.  The  proof  is  based  on  showing  that  with  high  probability,  there 
are  n(N/d32a*a)  live  cells,  each  of  which  is  located  at  the  center  of  a  region  of  dead  cells  whose 
radius  is  at  least  d. 

Partition  the  wafer  into  diamond-shaped  regions  as  was  done  in  Figure  6  to  prove  the  lower 
bound  of  Theorem  1,  except  make  the  site  of  each  region  be  2d3  cells.  The  probability  that  any 
particular  region  consists  of  an  isolated  live  cell  at  the  center  of  2d3  —  1  dead  cells  is  2~~3**.  The 
probability  that  T  or  fewer  of  the  N/ 2d3  regions  are  like  this  is  thus 


(N/d2  2m*)* 


<  e  m,im>  y 


«— >o 


Z! 


When  T  assumes  the  value  N/StP2u*,  the  largest  term  in  the  series  occurs  for  z  =  T,  and  thus 
the  preceding  expression  can  be  bounded  above  by 


(T±l) 

n 


=  0(1/ N) . 


In  order  to  prove  the  upper  bound,  consider  the  graph  whose  vertices  are  live  cells  on  the 
wafer  and  whose  edges  connect  cells  which  are  within  distance  d  of  each  other  on  the  wafer.  In 
what  follows,  we  will  show  that  there  is  one  main  connected  component  in  this  graph,  and  that 
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the  toUl  rise  of  all  other  isolated  components  is  a  small  fraction  of  N.  More  specifically,  we  will 
show  that  there  exist  coostants  c  and  e  such  that  the  probability  that  more  than  c?2~ed*N  live 
cells  are  isolated  is  0{l/N). 

The  approach  will  be  to  find  a  crude  upper  bound  on  the  number  of  paths  which  can  define 
the  omter  boundary  of  as  isolated  region.  (See  Figure  8.)  For  any  given  path  which  defines  the 
outer  boundary  of  a  potentially  isolated  region,  we  will  show  that  the  probability  is  very 
that  all  the  cells  are  dead  in  the  corresponding  width*d  boundary  region.  In  particular,  the  longer 
the  path  that  defines  a  potentially  isolated  region,  the  smaller  the  probability  that  the  region  is 
actually  isolated. 


Figure  8.  Examples  of  isolated  regions. 


Because  there  are  N  positions  at  which  a  path  can  start  and  at  most  four  ways  it  can  continue 
at  each  step,  there  are  at  most  N 4r  paths  consisting  of  r  consecutive  cells.  Thus  there  are  at 
most 

(Nir\  Nk4rk 
\  k  )-  *! 

sets  of  k  different  paths  of  length  r. 

The  number  of  paths  of  length  r  is  quite  a  formidable  number,  and  at  first  glance  it  n— t 
unlikely  that  our  approach  will  work.  The  probability  is  quite  small,  however,  that  each  of  k 
given  paths  actually  defines  a  region  which  both  is  isolated  and  contains  at  least  one  live  cell.  For 
a  region  to  be  isolated,  its  boundary  region  must -consist  of  at  least  rd/6  dead  oells,  where  r  £  d. 
The  probability  that  all  krdf 8  cells  are  dead  in  the  boundary  regions  of  k  potentially  isolated 
regions  with  a  boundary  of  length  r  is  2”*r*'#.  Thus  the  probability  that  there  are  actually  * 
isolated  regions,  each  containing  one  or  more  live  sells,  with  outer  boundaries  of  length  r  is  at 
most  (N*2a'*-*</«)/*!,  which  for  k  >  eN/2'V"  and  d  >  32  is  less  than  l/N*. 


Observe  that  a  region  with  an  outer  boundary  of  length  r  contains  0{r3)  live  cells.  Thus  for 
d  >  32,  with  probability  1  —  0(1  / N )  at  most 


N 


E 


eN 

2rd/l$ 


0(r3)  =  O 


live  cells  are  isolated  from  the  largest  component  on  the  wafer,  which  implies  that  for  c  <  1/16 
at  most  0(2~C**N)  live  cells  are  isolated.  For  d  <  32  the  same  result  holds  by  Bimply  adjusting 
the  constant  hidden  by  the  Big  Oh.| 


By  choosing  d  to  be  a  sufficiently  large  constant,  Lemma  3  ensures  that  with  high  probability,  a 
constant  fraction  of  the  live  cells  on  the  wafer  can  be  connected  into  a  spanning  tree  with  constant 
wire  length.  Because  we  know  all  wires  will  be  constant  length,  Prim's  minimum  spanning  tree 
algorithm  [28]  can  be  modified  to  run  in  linear  time  instead  of  the  normal  0(N3). 

Theorem  4.  With  probability  1  —  0(1  /N),  any  eonatant  fraction  (lets  than  1)  of  the  live  cells 
on  an  N-cell  wafer  can  be  connected  m  a  linear  array  with  constant- Lr,gth  wires. 

Proof.  Straightforward  from  Lemmas  2  and  3.| 

To  conclude  this  section,  we  provide  a  theorem  which  states  our  results  on  constructing  linear 
arrays  in  their  fullest  generality.  The  proof  is  similar  to  that  of  Lemma  3,  and  is  not  included 
here. 

Theorem  S.  With  probability  1  —  0(1/ N),  at  least  l  — e  of  the  live  cells  on  an  N-cell  wafer  can 
be  connected  m  a  linear  array  using  wires  of  length  O(s^Iog^e)  and  channels  of  width  2,  where  p 
is  the  probability  of  a  particular  cell  dying,  a  is  the  side  length  of  each  cell,  and  1/N  <  e  <  p  <  1. 
This  bound  cannot  be  improved  by  more  than  a  constant  factor  for  any  p,  e,  or  s. 


4.  A  lower  bound  for  wafer- scale  integration  of  two-dimensional  systolic  arrays 

The  problem  of  linking  the  live  cells  on  a  wafer  to  form  a  square  two-dimensional  array  is 
substantially  more  difficult  than  the  corresponding  problem  for  linear  arrays.  The  main  difficulty 
with  constructing  two-dimensional  arrays  u  that  constant  length  wires  no  longer  suffice  when  we 
throw  away  some  of  the  live  cells.  In  this  section  we  provide  a  lower  bound  on  the  length  of  the 
longest  wire  required  by  a  two-dimensional  array.  This  bound  was  first  discovered  by  Greene 
and  Gamal  [8].  Our  proof  (which  is  amilar  to  but  more  general  than  that  in  [8])  was  obtained 
independently  from  an  idea  due  to  Joel  Spencer  [40]. 

Theorem  8.  With  probability  1  —  0(1  fN)  every  realisation  of  any  m-cell  two-dimensional  array 
on  an  N-cell  wafer  has  a  wire  of  length  fi(\/lf  m ),  for  all  m  =  0(lga  N). 

Proof  The  proof  consists  of  two  parts.  In  the  first,  ws  show  that  with  high  probability,  the 
wafer  contains  a  Urge  number  of  regularly  spaced  square  regions  of  $  lgm  cells,  each  of  which  is 
dead.  In  the  second  part  of  the  proof,  we  show  that  any  realisation  of  an  m-cell  two-dimensional 
array  must  contain  a  cycle  of  four  cells  that  surrounds  the  eenter  of  one  of  these  dead  regions. 
Thus  one  of  the  wires  in  the  4-cyde  will  have  length  • 
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First,  partition  the  N-cell  wafer  into  square  regions  with  m/32  cells  each,  and  then  partition 
each  of  these  regions  into  square  subregions  with  $  lg  m  cells  each.  We  claim  that  with  high 
probability,  every  m/32-cell  region  contains  a  }  Igm-cell  subregion  in  which  every  cell  is  dead, 
as  is  illustrated  in  Figure  0. 


The  probability  that  any  particular  { Igm-cell  subregion  contains  at  least  one  live  edi  ts 
1  —  Thus  the  probability  that  each  of  the  |  Ignt-ceff  subregions  in  a  particular  m/32-oril 
region  contains  at  leant  ooe  live  cell  is 

(j  -  , 

Uses  1 4-  x  <  «*  for  ell  x.  The  probability  that  one  or  nwflref  the  ttN/m  m/32-cdl  regions 
fails  to  contain  a  totally  dead  $  Igm-cell  subregion  in  at  moil 

ffl 

for  m  «•  n(lgs  N),  which  completes  the  ftrst  half  of  the  proof. 

If  wo  can  show  that  a  4- cycle  of  ths  two-dimeasionaf  array  encloses  the  center  of  one  of  the 
|  lg  m-eell  dead  regions,  the  proof  will  be  complete  because  snfrof  thewtreTof  the  4-eyeie  wffl  hsve 
lsagth  at  least  |y1gm .  More  generally,  however;  if  any  cycle  in  the  array  surrbunds  the  center 
of  a  dead  subregion,  then  seme  wire  in  the  array  must  have  length  J>/Igm.  This  observation 
follows  because 
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1)  every  inetod  eyde  is  s  two-dimensional  array  can  be  decomposed  into  the  sum  of 
directs  !  4-cydee,  and 

3)  the  sassbar  of  times  a  cycle  “wraps”  around  a  point  in  the  plane  is  equal  to  the  sum  of 
the  number  of  wraps  for  each  4-cycle  in  its  decomposition. 

Thus  a  two-dimensional  array  with  a  cycle  that  encloses  the  center  of  a  dead  region  must  also 
contain  a  4-cycle  that  surrounds  the  center  of  the  dead  region. 

'  We  must  now  show  that  noth  high  probability,  every  realisation  of  every  m-cell  two-dimensional 
artsy  contains  a  cycle  that  encloses  the  center  of  a  square  region  of  \  Igm  dead  cells.  We  al¬ 
ready  know  that  with  high  probability  a  wafer  contains  contains  a  dead  subregion  of  this  sise 
in  every  square  region  of  m/32  cells.  Assume  for  the  purposes  of  contradiction  that  an  m-cell 
two-dimensional  array  can  be  realised  on  such  a  wafer  so  that  no  cycle  of  the  array  surrounds  the 
center  of  one  of  the  dead  regions.  Consider  a  line  drawn  between  the  ranters  of  two  dead  regions. 
If  any  wires  cross  this  line,  their  removal  will  disconnect  the  two-dimensional  array  into  two  or 
more  components,  as  is  shown  in  Figure  10. 


Figure  10.  Disconnecting  a  two-dimensional  array. 


Amsng  afl  pairs  of  neighboring  dead  regions  (i.e.,  pairs  contained  in  m/32-cdl  regions  that 
dbate  an  edge  or  comer),  there  is  at  least  one  pair  for  which  removal  of  the  wires  passing  between 
them  disconnects  the  array  into  two  pieces,  each  with  at  least  m/3  cells.  Since  at  most  iy/m/92  *= 
y/mji  wires  can  crow  the  line  between  the  centers  of  two  neighboring  dead  regions,  by  removing 
only  y/rnfi  wires,  we  can  disconnect  an  m-cell  two-dimensional  array  into  two  pieces,  each  with 
at  Inset  m/3  cells.  But  it  is  well  known  that  any  such  disconnection  requires  y/m  wires  to  be 
removed,  and  we  have  obtained  the  contradiction  that  completes  the  proof.l 

The  most  interesting  case*of  Theorem  6  is  when  the  two-dimensional  array  to  be  constructed 
has  m  os  0(N)  cells. 

Corollary  7  With  probability  1  —  0(1  /N)  every  re  dilation  of  any  two-dimensiond  array  that 
vtUiies  any  constant  fraction  of  the  live  cells  on  an  N-cell  wafer  has  a  wire  of  length  fl(>/)g  Af  )• 


S.  A  dhride-and- conquer  method  for  constructing  two- dimensional  systolic  arrays 

The  principal  focus  of  this  paper  is  the  construction  of  systolic  arrays  on  wafers  such  that 
the  wire  knfth  is  minimised.  In  this  section  we  ignore  maximum  wire  length  as  a 

cost  measure  and  look  at  the  problem  of  constructing  systolic  arrays  when  only  channel  width 
is  at  issue.  In  doing  so,  we  shall  extend  the  general  VLSI  layout  results  of  [19]  and  [21]  to  (he 
wafer-scale  situation  whore  some  of  the  cells  may  be  faulty.  Furthermore,  the  analysis  of  this 
•action  is  worst  ease  and  not  probabilistic,  and  thus  all  possible  configurations  of  live  and  dead 
cells,  however  unlikely,  can  be  handled. 

The  basic  result  of  this  section  is  that  a  two-dimensional  array  can  always  be  constructed  from 
all  the  live  cells  of  an  Af-ctil  wafer  if  the  channels  have  width  0(lg  AT).  This  result  will  be  used 
in  the  next  section  as  a  subroutine  in  methods  that  achieve  better  bounds  for  wire  length.  The 
divide- and-conquer  technique  used  in  the  construction  is  similar  to  general  VLSI  layout  methods 
based  on  separators  [21]  and  btfurcators  [19]. 

We  first  prove  a  result  on  encoding  two-dimensional  arrays  in  complete  binary  trees  h  la 
Rosenberg  [31]  where  some  of  the  leaves  may  be  dead.  An  encoding  of  a  graph  G  —  (V,E)  in 
a  tree  7  is  a  one-to-one  mapping  /  from  the  vertices  V  to  the  leaves  of  7.  In  our  case,  /  must 
map  V  to  live  leaves  of  7.  Such  a  mapping  can  be  extended  aataraHy  to  map  S  to  the  paths  of 
7,  where  /  maps  (u,  v)  to  the  unique  simple  path  connecting  fty)  to  f[v). 

Lemma  8.  Let  7  be  a  complete  binary  tree  with  each  of  its  N  leaves  labeled  as  either  " live "  or 
"dead,"  and  let  Ai  be  the  member  of  live  leaves.  Then  for  any  hi -element  two-dimensional  array 
G,  there  exists  an  encoding  f  of  G  in  T  such  that  only  edges  of  E  are  mapped  by  f  to  an 
edge  ofT  that  has  k  in  conduit  leaves. 

Proof.  We  rely  on  the  fact  that  every  tree  has  a  weighted  one  separator  theorem  [23].  That 
is,  if  the  vertices  of  the  tree  are  given  arbitrary  weights,  removal  of  a  single  vertex  will  partition 
the  tree  into  two  components,  each  with  leas  than  two  thirds  be  weight,  hi  our  case,  we  weight 
the  internal  nodes  and  dead  leaves  of  the  tree  7  with  sore  and  the  live  leaves  with  one.  Then,  in 
fact,  a  single  edge  of  the  tree  can  be  removed  te  epMt  the  tree  Ms  two  components,  each  with  at 
least  hi/3  loaves. 

The  construction  of  /  is  obtain od  by  a  dhrido-and  eonqew  algorithm  Mb  start  by  attempting 
to  encode  the  original  two  dimensional  array  G  m  tbe  original  tree  7.  Of  course,  the  number  of 
live  cells  on  a  wafer  will  rarely  be  a  perfect  square,  and  the  anbarrays  corresponding  to  internal 
nodes  of  the  tree  will  not  be  square  cither.  Wo  shall  allow  tbe  anginal  array  to  he  misting  same 
sells  from  the  bottommost  vow  and  tbe  rightmost  column.  Any  sob  array  tout  it  psatiad  mgr 
ba  misting  some  calls  from  each  of  its  four  edges,  and  udi  M  gsnaral  have  the  shape  shewn  in 
Figure  11. 

Each  recursive  iteration  attempts  to  encode  ao  tn- element  eubarray  6  of  G  in  a  subtree  f  of 
7.  Using  the  weighted  separator  for  trees,  we  determine  a  tingle  edge  whose  removal  partitions 
the  tree  f  into  two  subtrees,  each  with  at  least  m/3  live  leaves.  Tbe  two-dimensional  array  & 
ia  then  cut  Into  two  subarrays  with  the  corresponding  number  of  live  cells.  By  cutting  parallel 
to  tbe  short  dimension,  we  can  ensure  that  the  suburrays  ef  6  have  perimeter  0(y/i w),  which  is 
important  for  the  analysis  later  on.  Finally,  tbe  two  aubarrays  of  G  are  encoded  recursively  » 
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Figure  11.  A  6-by6  array  that  it  missing  some  border  mjl« 

the  two  subtrees  of  f1.  The  recursion  terminates  when  the  subarray  to  be  encoded  consists  of  a 
single  vertex  v  of  G.  When  this  happens,  the  edge  immediately  above  the  single  live  leaf  of  the 
corresponding  subtree  of  T  is  removed,  and  the  vertex  v  is  mapped  to  the  live  leaf. 

It  remains  to  be  shown  that  this  encoding  maps  only  0(\/k)  edges  of  the  graph  G  to  any  edge 
t  leading  out  of  any  Jt-leaf  subtree  V  of  T.  Look  at  e  and  those  edges  beneath  in  V  as  they  were 
cut  during  the  execution  of  the  algorithm.  (See  Figure  12.)  The  first  cut  of  one  of  these  edges 
partitions  some  subtree  f  which  contains  T*  into  two  portions,  each  with  at  least  one-third  the 
live  leaves  of  f.  One  of  the  two  pieces  is  a  subtree  T"  of  T  into  which  a  subarray  is  encoded. 
The  number  of  connections  from  this  subarray  that  pass  through  t  is  at  worst  the  perimeter  of 
the  subarray,  and  the  number  of  elements  in  the  subarray  is  at  most  k.  (This  worst  case  occurs 
when  V  —  V  and  e  is  the  first  edge  cut.) 


Figure  12.  The  relationships  among  trees  in*the  proof  of  Lemma  8. 


Therefore,  eo)y  0(Vk)  connections  from  this  subarray  can  possibly  pass  through  e  to  t  —  T". 
No  other  edges  from  within  T"  pus  through  e  to  the  rut  of  f,  but  some  in  7*  —  T"  might.  The 
subtree  t  —  T*,  however,  hu  nt  most  two- thirds  the  live  leavu  of  t,  end  thus  only  0(y/Tk) 
additions!  connections  corresponding  to  the  second  cut  esn  pose  through  e.  By  induction,  the 
sum  of  the  perimeters  of  all  arrays  that  could  pass  through  e  is  bounded  by  0(y/k)  because  the 


•  The  encoding  of  o  two-dimensional  array  in  on  AMoaf  complete  binary  tree  corresponds 
naturally  to  an  embedding  of  the  array  in  an  OfN)-laaf  tree  of  meshes  [17,  li,  If).  Figaro  13 
•hows  o  16-leaf  tree  of  meshes.  The  root  of  the  complete  binary  tree  hu  0{y/N)  connections 
passing  through  it  from  one  side  to  the  other.  In  the  corresponding  tree  of  meshes,  tho  switching 
of  these  connections  is  accomplished  by  a  0(\/N)-by-0{>/N)  mesh  at  the  root.  The  two  subtrees 
of  the  root  of  the  complete  binary  tree  correspond  reenrerirsiy  to  the  two  subtrees  of  tho  root  of 
tho  tree  of  meshes.  The  loom  of  the  complete  binary  tree  will  ho  eeshoddod  in  email  a— hoe  at 
moot  o  constant  distance  frem  the  leaves  of  the  tree  of  meshes  because  the  moob  at  the  root  of 
the  tree  of  meshes  is  a  constant  factor  larger  than  v/5V-by-v/jfV. 


figure  13.  The  16-lonf  tree  of 


The  upper  level  meahosof  the  tree  of  moehoe  c— tain  only  wires,  the  bottom  level  meobos 
are  empty,  and  small  meshes  near  the  bottom  contain  the  cetts  of  the  two-dimensional  army.  If 
we  chop  off  the  unused  lower  level  meshes,  we  obtain  a  shortened  tret  of  meshes  whose  lure 
u  suspend  to.  the  coMa  of  the  twoHttaeisionol  aerey.  The  neat  lemma  shorne  that  whertansd 
tumef  moshu  us  be  omboddod  on  a  wafer  with  th— oh  of  width  OQf  JF). 


Lobmdo  *.  An  N- leaf  shortened  tree  ofmeehes  can  be  constructed  on  on  N-ctU  wafer  that  hoe 
a  uniform-  ekmmek width  of  6(lg  N)  to  that  thelemm  of  the  shortened  tree  of  meehes  correspond 
in  o  one-to-one  manner  with  the  telle  of  the  wafer. 


Proof.  The  first  step  is  to  construct  a  0(lgJV)-lapor  three-dimensional  layout  (20,  32)  of  the 
shortened  tree  of  meshes.  Fold  the  connections  between  the  root  of  the  shortened  tree  of  meshes 
and  each  of  its  two  sons  to  that  the  sons  fit  naturally  on  a  second  layer  over  the  root.  Fold  the 
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connections  to  each  of  the  grandsons  so  that  they  fit  naturally  over  the  sons  on  a  third  layer, 
and  so  forth.  This  generates  a  8{lgN)-layer  three-dimensional  layout  where  each  layer  has  linear 
area.  By  projecting  the  three-dimensional  layout  onto  a  single  layer  in  the  manner  of  (42,  pp. 
36-38],  channels  with  a  uniform  width  of  6(lg  N)  are  obtained.l 

The  next  theorem  is  the  major  result  of  this  section. 

'  Theorem  10.  Any  M-cell  two- dimensional  array  ean  be  constructed  from  any  subset  of  the  live 
ceils  on  an  N-cell  wafer  tump  wires  of  length  0(yfN  IgN)  end  channels  of  width  O(lgN). 

Proof.  Immediate  from  Lemmas  8  and  fi.| 

By  using  two-color  bisectors  (3]  or  fully  balanced  bifurcators  [10],  the  results  of  this  section 
can  be  generalised  to  the  encoding  or  embedding  of  classes  of  graphs  other  than  two-dimensional 
arrays.  The  general  idea  is  to  use  these  tools  to  bound  the  number  of  external  connections  from  a 
subgraph  during  the  divide-and-conquer  algorithm  in  the  proof  of  Lemma  8.  The  only  subtlety  is 
that  proportional  cuts  are  required,  which  involves  several  applications  of  the  two-color  bisector 
or  fully  balanced  bifurcator.  All  of  the  bounds  on  areas  of  graphs  reported  in  [19]  and  [21] 
can  then  be  obtained  in  the  wafer-scale  model  where  channels  have  uniform  width  and  cells  can 
be  defective.  (For  a  more  complete  description  of  how  these  techniques  can  be  used  to  embed 
arbitrary  graphs  in  a  fault-tolerant  manner,  see  [2].) 

6.  Upper  bounds  for  wafer-scale  integration  of  two-dimensional  systolic  arrays 

Theorem  6  from  Section  4  gives  a  lower  bound  of  0(>/IgW)  on  the  length  of  a  wire  in  any 
realisation  of  a  two-dimensional  systolic  array  that  utilises  all  or  most  of  the  live  cells  of  an  N-cell 
wafer.  We  do  not  know  how  to  achieve  this  lower  bound,  but  we  can  come  close.  This  section 
gives  three  nontrivial  upper  bounds  for  wire  length  and  channel  width.  Of  the  three  methods, 
however,  only  the  algorithm  in  the  proof  of  Theorem  13  achieves  the  lower  bound  of  Theorem  6. 
Unfortunately,  this  algorithm  utilises  only  m  =  8(N/lglg*  N)  of  the  live  cells. 

We  first  present  a  divide-and-conquer  algorithm  that  constructs  a  square  two-dimensional 
array  using  all  the  live  cells  on  a  wafer.  In  the  first  stage,  the  wafer  is  recursively  bisected,  and 
the  number  of  live  cells  in  each  half  is  counted.  Based  on  the  count  of  live  cells  in  each  half  of  the 
wafer,  the  algorithm  computes  the  dimensions  of  the  two  subarrays  that  must  be  constructed, 
and  then  recursively  constructs  the  subarrays.  The  two  subarrays  are  then  linked  together  to 
form  the  complete  array. 

The  algorithm  remains  in  the  first  stage  until  subproblems  with  6(lg  N)  cells  are  encountered. 
At  this  point  the  techniques  used  in  Theorem  10  are  used  to  complete  the  wiring  of  a  8(lg  N)-eell 
subarray.  The  exact  crossover  point  between  the  first  and  second  stages  can  be  set  at  subproblems 
of  site  c  lg  N,  where  c  is  any  constant  sufficiently  large  to  ensure  that  with  high  probability,  every 
clg  N-cell  region  contains  0(lgN)  live  cells.  (For  example,  a  choice  of  c  «  2  will  suffice.) 

Figures  14  through  17  illustrate  the  divide-and-conquer  procedure.  Figure  14a  shows  a  64-cell 
wafer  which  contains  36  live  cells.  In  what  follows,  we  step  through  the  algorithm  as  it  constructs 
a  6-by-6  array,  which  is  identified  as  the  “overall  target”  in  Figure  14b. 
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Figure  14*.  A  64-e*H  wafer  that  contain*  36  live  edit. 
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figure  14b.  The  target:  a  Mj4  ayetohe  a mj. 


The  first  tfap  ia  t»  Uaaet  the  wafer  vertically, 
in  the  right.  ■whbt»«ggMtalball>ek« 
ht  the  right  half  ante.  ftoc*  we  want  the  two  • 


«f  the  6-by-6  array  ahewa  in  figure  15. 


which  gfvea  19  live  eeUa  ia  the  left  half  and  17 
mg  ia  the  Ml  half  wafer  and  a  17-eaN  eebarray 
■hurray*  ta  At  together  nicely  altar  they  ha** 
nAamgathat  we  determined  by  the  partition 


lo  thif  example  the  number  of  edit  it  small  enough  that  the  aeeond  stage  construction  can  be 
performed  by  inspection.  The  inspection  strategy  ean  be  used  effectively  in  practice.  Since  the 
■•tond  stage  operates  on  regions  of  aise  8(lg  N),  the  routings  of  this  nse  ean  be  precomputed. 
The  second  stage  then  consists  of  a  single  table  lookup.  At  worst,  this  strategy  costs  polynomial 

time  sad  space. 

Figure  17  shows  the  final  solution  to  the  problem  in  Figure  14.  For  clarity  the  wires  have  net 
been  rooted  within  the  channels  of  the  wafer.  Notice  that  each  quadrant  contains  the  specified 
targets  for  second  level  of  recursion.  The  dashed  lines  represent  wires  that  connect  cells  in 
different  quadrants  o f  the  wafer. 


Figure  17.  Completed  cell  astigmnsat  and  wiring  of  the  fi-by-g  army. 


The  next  theorem  describes  how  wall  the  dhridwaad-coaqncr  algorithm  performs  with  respect 

to  who  length  and  chanml  width. 

Thssrssn  II.  With  pnotoMily  1  —  0(1  fS)  a  (soffnesiM  array  can  hr  eorutraeUd  from 
qU  the  hoe  cells  on  an  N-ctU  wafer  urinf  wires  of  length  OfigN  lglgN)  end  channel*  of  width 

Qkk*)- 

Proof.  The  (fividb  snd-cooquer  algorithm  Just  dmsribed  inevMw  the  hounds  in  the  theorem. 
The  analysis  is  divided  into  two  ports  corresponding  to  the  two  stages  of  the  algorithm. 

We  begin  at  the  first  level  of  recursion.  Consider  the  wires  that  link  a  cell  in  the  left  subarray 
to  a  ceil  in  the  right  subarray,  at  is  illuetrated  by  the  two  examples  in  Figure  18.  For  the  most 
fhrt,  ths  nsaaotoag  wtni  can  he  rented  ia  the  channel  that  eoparatoe  the  loft  and  right  halves 


of  the  wafer.  The  length  of  the  longest  wire  in  the  channel,  u  well  ns  the  width  of  the  channel 
itself,  is  proportional  to  the  longest  vertical  distance  that  a  single  wire  must  traverse. 

The  length  of  the  longest  wire  in  the  center  channel  depends  on  the  distribution  of  cells  in 
each  quadrant  For  example,  if  we  are  extremely  lucky  and  the  live  cells  are  regularly  spaced, 
the  longest  wire  may  have  constant  length,  as  in  Figure  18a.  But  if  we  are  very  unlucky,  half  the 
live  cells  might  occur  in  the  upper  right  quadrant  and  the  other  half  in  the  lower  left  quadrant 
(Figure  18b).  To  connect  the  two  halves,  some  wire  will  have  length  Ci(y/N). 


Figure  18b.  A  distribution  of  live  cells  which  requires  a  wide  center  channel. 
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The  tenth  of  the  longest  wire  in  the  center  channel  can  also  be  influenced  by  the  distribution 
of  eells  within  a  quadrant  For  example,  if  the  upper  left  quadrant  contains  y/N] I  live  cells 
(about  the  right  number),  but  they  are  distributed  as  in  Figure  19,  then  the  center  channel  still 
contains  a  wire  of  tenth  fl(vN). 


Figure  J9.  - Another  distribution  ef  ttue-esils  which  requires  a  wide  center  channel. 


Most  often,  we  are  net  so  unlucky  that  a  wire  in  the  ewrtiar  channel  has  tenth  fl(j/W), 
hut  neither  are  we  lucky  enough  that  all  wires  are  constant  length.  We  now  show  that  with 
high  probability,  we  are  more  lucky  than  unlucky  because  the  length  of  the  longest  wise  in 
is  OQg  N).  The  key  lo  the  analytes*  to  pm«e  that  the  live  cells  are  distributed  so 
evenly  that  with  high  probability,  the  total  vertical  distortion  of  the  wires  m  the  center  channel 
{over  all  subproblems  of  'teas  0(lgN))  is  0(lg  N).  in  water  to  do  ao,  we  fisst  shserve  that  “for 
aU  positive  r,  with  probability  1  -  0(e-*')  the  four  gwtetnrts  in  the  tlh  sobprobten  each 
have  m/4  ±  0(ry/m)  Hue  cslls,  where  m  *  N/2*^1  Thus  with  probability 

'1— 0(e~*r*),  a  subprubten  contributes  at  meet  <?(*)■  dietortienPf  wires  in  the-eenter  cbanoel'that 
are  connected  to  the  subarrsy  at  the  level  of  the  eubprobten.  There  are  0(lg  N)  subproblems 
that  can  contribute  to  the  distortion  of  a  given  wteetin  the  center  channel.  Using  ^standard 
— nbiaitee ill  ergnmwits  Involving-  sums  df  random  variables,  it  is  now  possible  todhowithatithf 
wont  can  oTthersum  Of  thedistertioas  is  0(ij‘fV)  with  probability  1  —  0(t/Af). 

The  same  observations  can  be  need  to  prove  a  high-probability  bound  of  0(^lgN  |gm)  on 
the  distortion  of  wires  that  connect  subarrays  of  sise-m  «=  R(lgN).  Thus  it  is  ■tffflcieutthat 
tho  riunniteteateBsnMnhpteMoui*  withm  oeHr  have  width  0(VlgWJgm).  By  eusmnigfnusrpdl 
OOg  N)-d»od  subpsOhtesas,  it  can  be  chocked  tbattet  this  point,  the  overage  channel  width an  the 
wafer  is  0(1),  which  is  because  the  channels  instdeOflf-Afonsod  subproblems  have  not  beenjised 
at  dD.  Tbs  eonteaut  average  channel  width  eon  ■bewchloiwl.asa  maximum  without  insssnteng 
the  Isngth  Oftnny  wire  by.  mom  then  0(lgN).  Tbe  Msa  isAo-^stribute  the  0(>/lg  N  igwi^euMth 
Aa^ten— ocnteghboriag  unund  channels.  As  the  dotetls  of  this  argument  are  senewhet 
liteteosteetetenvoaateMeri  tham.  This  condudes  therbrst  eUge  of  the  analytes. 
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Hie  analysis  of  wires  that  link  cells  within  a  0(lgN)-cell  subproblem  differs  substantially 
from  the  preceding  analysis  because  live  cells  within  a  small  region  can  have  arbitrarily  irregular 
distributions  with  high  probability.  The  regions  of  irregularity  are  small  enough,  however,  that 
the  worst-case  distributions  are  not  really  all  that  bad.  For  example,  if  a  0(lgN)-cell  region  has 
the  structure  shown  in  Figures  18b  or  18,  then  the  maximum  distortion  of  a  wire  at  the  top  level 
of  the  recursion  is  just  0{\/lgN). 

In  fact,  the  analysis  of  Section  S  ensures  that  the  algorithm  constructs  a  two-dimensional 
array  in  each  m  =  0(lg  N)- cell  region  using  wires  of  length  Oly/m  lg  m)  =  0(y/lgN  lglg  N)  and 
channels  of  width  O(lgm)  =  O(lglgN).  Thus  the  entire  two-dimensional  array  is  constructed 
using  wires  of  length  0(lgN  lg  lg  N)  and  channels  of  width  0(lglg  N).  The  extra  lg  lg  N  factor 
in  the  wire  length  bound  comes  about  because  a  wire  with  0(lg  N)  distortion  crosses  0( lg  N) 
channels,  each  of  width  0(lglgN).| 

The  wire  length  analysis  of  the  algorithm  in  Theorem  11  is  fairly  tight.  For  example,  the 
algorithm  requires  wires  of  length  0(lg  N)  with  high  probability.  Thus,  if  the  lower  bound  in 
Theorem  6  is  to  be  achieved,  a  different  algorithm  must  be  discovered.  It  may  be  possible  to 
improve  the  channel  width  bound,  however.  Any  improvement  in  Theorem  10  would  directly 
lead  to  an  improvement  in  the  channel  width  bounds  in  both  Theorem  11  and  the  next  theorem, 
which  shows  how  to  construct  a  two-dimensional  array  from  most  of  the  live  cells  on  a  wafer. 

Theorem  12.  With  probability  1  —  0(1 /N)  a  two-dimensional  array  can  be  constructed  from 
any  constant  fraction  (less  than  1)  of  the  live  cells  on  an  N-cell  wafer  using  wires  of  length 
0(y/[gN  lglg  N)  and  channels  of  width  O(lglgN). 

Proof.  The  key  idea  is  to  partition  the  wafer  into  N/clgN  square  regions,  each  containing 
m  =s  clg  N  cells,  where  c  is  a  sufficiently  large  constant.  With  probability  1—0(1  /AT),  each  of  the 
regions  contains  at  least  m'  =  £e(l  —  2/y/c)\g N  live  cells.  Using  the  technique  of  Theorem  10, 
we  can  therefore  construct  an  m'-cell  two-dimensional  array  in  each  region  using  wires  of  length 
0(^m  lgm)  =  0(\AgN  lg  lg  N)  and  channels  of  width  O(lgm)  =  O(lglgN).  The  N/clgN 
two-dimensional  arrays  are  then  connected  together  into  one  large  array  with  $N(1  —  2/y/c) 
live  cells.  The  added  wires  have  length  at  most  0(y/ lgJV  lglg  AT),  and  the  channel  width  is  not 
substantially  increased.! 

For  each  of  the  two  previous  results,  the  channels  on  the  wafer  have  width  O(lglgN).  The 
next  theorem  shows  that  with  high  probability  a  two-dimensional  array  can  be  constructed  from 
many  of  the  live  cells  on  a  wafer  using  channels  of  unit  width.  Furthermore,  the  lower  bound  of 
n(Vlg  N )  on  wire  length  given  in  Theorem  6  is  achieved  by  this  construction. 

Theorem  13.  With  probability  1  —  0(1 /N)  at  least  a  fraction  0(1/  lglg*  N)  of  the  live  cells  on 
an  N-cell  wafer  can  be  connected  ml©  a  two-dimensional  array  using  wires  of  length  0(^lgN) 
and  channels  of  unit  width. 

Proof.  The  proof  is  similar  to  that  of  Theorem  12.  As  jMfore,  we  partition  the  wafer  into 
square  regions  with  clgN  cells  each.  The  constant  c  must  be  chosen  large  enough  to  ensure  that 
with  high  probability,  each  region  contains  IgN  live  cells.  We  next  partition  each  clg  N-cell 
region  into  square  subregions  with  c'lglg*  N  cells  each.  Consider  all  pairs  of  indices  i  and  j  in 


the  range  1  <  i,j  <  y/P  IglgN.  For  a  given  region  of  elgN  cellt,  at  least  one  pair  (*,  j)  satisfies 
the  condition  that  at  least  1/c  of  the  cells  in  the  (i,j)  positions  of  of  the  subregions  are  alive. 
(Otherwise,  it  is  impossible  for  Ig  N  of  the  cells  in  the  region  to  be  alive.)  Notice  that  by  ignoring 
those  cells  not  in  the  (t,  j)  positions  of  a  subregion,  the  (»,  j)- positioned  cells  together  with  all  of 
the  channel*  of  the  region  form  a  “subwafer”  with 

1)  m  =  clgN/c'lg^N  cells  total, 

2)  at  least  m/e  =  Ig N/e'lglg2  N  live  cells,  and 

3)  channels  of  width  yfP  IglgN  =  0(lgm). 

By  choosing  d  large  enough,  the  technique  of  Theorem  10  can  be  applied  to  construct  within 
each  elgN-cell  region,  a  two-dimensional  array  with  lg N/e'lglg2  N  cells  using  wires  of  length 
0(^m  lg  m)  =  0(vlgN ).  These  arrays  can  then  be  easily  connected  together  to  form  a  two- 
dimensional  array  with  N/ed lglg3  N  cells  and  wires  of  length  0(y4gN  ).| 

By  setting  m  =  fl(N/  lg  lga  N),  it  can  he  checked  that  Theorem  13  achieves  the  lower  bounds 
for  wire  length  proved  in  Theorem  6.  The  cell  utilisation,  however,  leaves  something  to  be  desired. 

We  have  summarised  the  results  of  this  section  in  the  following  table.  Each  bound  is  achieved 
with  probability  1  —  0(1/N),  where  N  is  the  number  of  cells  on  the  wafer,  p  is  the  probability 
that  a  particular  cell  is  dead,  and  a  is  the  side  length  of  each  ceil.  (Wire*  are  rwned  to  have 
width  one.) 


Table  L  Bounds  on  wire-length  and  channel  width  for  two-dimensional  arrays. 


Portion  of  Hue 
cells  used 

AH 


Wire  length 


°(lo*i/s  N  (*  +  l°f»  1°*,/*  N)) 


Channel  width 

°(lo*a  kfij/jN) 


Constant  fraction 
(less  than  one) 


°{yJl°Si/pN  («  4-  logs  toil/,  n)) 


°(Jo«2  log1/rN) 


n^l/(log,  \ogi/pN^ 


°(>frogt/pN) 


1 


7.  Related  models  and  problems 

The  problem  of  incorporating  all  the  live  cells  of  a  wafer  into  a  linear  array  so  that  the 
maximum  wire  length  is  minimised  has  been  studied  in  non  standard  graph-theoretic  models 
and  has  come  to  be  known  as  the  Bottleneck  JYaveling  Salesman  problem  (7).  In  addition,  the 
wafer- seals  modal  of  N  cells  which  fail  independently  with  probability  1/2  is  essentially  equivalent 
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to  the  well-itudied  geometric  model  iz>  which  N  points  ere  thrown  down  randomly  in  a  unit  square 
(1,  9, 13,  29, 37,  44].  Thus  the  algorithms  for  constructing  linear  arrays  described  in  Section  3  can 
also  be  applied  to  the  Bottleneck  Traveling  Salesman  problem  in  the  geometric  unit-square  model. 
For  example,  our  results  can  be  modified  to  show  that  with  high  probability,  all  of  the  points  in 
the  unit  square  can  be  joined  into  a  hamiltonian  path  using  wires  of  length  0(y/\gN the 
least  possible.  In  addition,  most  of  the  points  can  be  joined  in  a  linear  array  using  wires  of  length 
0(1 /y/N),  again  the  least  possible.  Although  neither  of  these  results  have  been  explicitly  stated 
in  the  literature,  the  first  result  is  really  just  a  minor  extension  to  the  prior  work  of  Karp  (13]  and 
Bentley,  Weide,  and  Yao  (1].  The  latter  result  of  joining  most  of  the  points  differs  substantially 
from  previous  work,  however.  To  the  best  of  our  knowledge,  the  only  result  of  a  similar  n*tun 
is  due  to  Erdos  and  Renyi  |5]  who  showed  that  most  graphs  with  N  vertices  and  N  edges  have 
large  connected  components. 

Channel  widths  do  not  play  an  important  role  in  the  unit-square  model  because  the  lines 
drawn  between  points  are  infinitesimally  narrow.  Thus  the  algorithms  in  the  proofs  of  Theorems 
11  and  12  can  be  modified  to  construct  with  high  probability,  two-dimensional  arrays  containing: 

1)  each  of  N  points  thrown  randomly  into  a  unit  square  using  edges  of  length  no  more 
than  0(lg  N/y/N),  and 

2)  any  constant  fraction  (less  thin  one)  of  N  points  thrown  randomly  into  a  unit  square 
using  edges  of  length  no  more  than  0{yfl gN /y/N ). 

Since  the  lower  bound  of  Theorem  6  can  be  extended  to  the  unit-square  model,  the  second  result 
above  is  optimal,  and  thus  there  is  no  need  to  extend  the  result  of  Theorem  13. 

The  problems  considered  heretofore  in  this  paper  also  have  an  interpretation  in  a  purely  graph 
theoretic  model.  Suppose  we  are  given  a  two-dimensional  grid  graph,  and  assume  that  each  node 
in  the  grid  has  independently  a  probability  p  of  of  being  kco.  We  wish  to  find  a  subgraph  of  the 
grid  that  contains  only  good  nodes  and  that  forms  a  smaller  twni  mensiona!  grid,  ftsr  exampH. 
Figure  20  illustrates  the  embedding  of  a  good  three-by-three  grid  in  a  partially  had  four-by-four 
grid. 


Figure  20.  A  good  3-by-3  grid  formed  in  a  partially  bad  4-by-4  grid.  Good  nodes  are  denoted 
by  black  dots. 


Tm  objectives  we  .night  choose  to  optimise  io  such  a  problem  are: 

1}  maximising  the  rise  of  the  good  grid, 

2)  minimising  the  maximum  distance  between  neighbors, 

3)  minimising  the  total  distance  between  all  pairs  of  neighbors,  and 

4)  minimising  the  maximum  number  of  times  an  edge  in  the  partially  bad  grid  is  utilised. 
These  parameters  roughly  correspond  in  the  wafer-scale  model  to  the  usage  of  live  cells,  maximum 
wire  length,  total  wire  length,  and  maximum  channel  width,  respectively. 

The  beauty  of  the  graph  theoretic  model,  however,  is  that  it  generalises  naturally  to  broader 
daasts  of  graphs.  For  example,  the  same  kinds  of  questions  can  be  reasonably  asked  about 
the  dam  of  k-dimensional  grids  for  any  k,  the  class  of  complete  binary  trees,  or  the  dass  of 
hyp*rcubes.  In  each  case,  the  appropriate  problem  might  be: 

"A  network  m  tke  class  is  given,  but  some  portion  of  the  nodes  foil.  How  do  we  use  the  edges 
ond  good  nodes  of  the  network  to  construct  a  somewhat  smaller  network  of  the  same  typef 

For  linear  graphs  the  answer  to  the  question  is  straightforward.  This  paper  provides  a  starting 
point  for  two-dimensional  grids.  For  other  classes,  the  answers  are  as  yet  unknown.  Also  of 
interest  is  the  problem  of  embedding  a  graph  from  one  dass  in  a  partially  bad  graph  from  a 
different  dass.  Research  in  this  area  should  lead  to  a  greater  understanding  of  the  fault  tolerance 

of  networks. 

t.  Csndwding  remarks 

Far  all  the  theoretics  analysis  in  this  paper,  some  of  the  algorithms  described  are  quite 
practical.  Not  ealy  are  they  fast,  but  they  produce  good  results  because  the  constants  are  small. 
For  example,  the  methods  of  Section  3  can  be  used  to  show  that  there  is  a  simple,  linear-time 
algorithm  to  connect  of  the  live  cells  on  an  N-eell  wafer  into  a  linear  array  using  wins  of 
length  1,  3,  or  3  and  channels  of  width  at  most  2.  The  method  from  Section  6  for  connecting 
aO  the  Bee  calls  into  a  two  dimensional  array,  modified  to  do  table  lookup  on  small  subproblems, 
appears  to  be  mbataatiatty  better  than  what  has  been  used  in  practice  (38]. 

b  addition  to  providing  algorithms  for  constructing  one-  and  two-dimensional  arrays,  the 
techniques  used  in  this  paper  can  also  be  used  to  construct  arbitrary  networks  on  integrated 
circuit  wafers.  There  are  two  ways  this  can  be  done.  First,  one  could  embed  the  desired  network 
In  -  two-aiiucrronal  array  using  the  methods  described  in  (2, 19,  21,  42,  43]  and  then  construct 
the  two-dimensional  array  using  the  procedures  from  Section  4.  Alternatively,  one  could  apply 
the  divide- ond-conquer  process  directly  to  the  network.  For  example,  the  latter  approach  cam  be 
need  to  construct  with  high  probability  a  complete  binary  tree  from  the  live  cells  suing  constant 
sh snarl  width  and  edges  of  length  0\  /N/lgN),  the  least  possible. 

Sams  of  the  problems  mentioned  in  this  paper  have  been  studied  independently  by  Greene 
and  Gamal.  b  their  recent  paper  [8],  they  prove  most  of  the  insults  found  in  Section  3  as  well  as 
the  tower  bound  iw  Section  4.  Their  aaalyris  of  linearly  connected  arrays  is  somewhat  diffareat 
freon  sun,  however,  as  they  rely  on  percolation  theory  from  statistical  physics. 

b  aridities  Manning  (25,  38],  Hadlund  (10],  Koran  (14],  and  Fussell  and  Vanaan  fS]  look 
at  the  bask  problem  of  constructing  arravt  from  defective  arrays.  Each  gives  algorithms  but 
Kttto  theoretical  or  statistical  analysis.  Rosenberg  (33,  34]  has  also  investigated  issues  of  fault 
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